\chapter{Introduction}

The \CC\,standard library is arguably quite limited in its functionality. However, when it comes to data and number crunching, the \CC\,standard library provides a versatile toolkit of algorithms.

If you are a \CC\,developer, good familiarity with \CC\,standard algorithms can save you a lot of effort and accidental bugs. Notably, whenever you see a raw loop in your code, you should question whether calling a standard algorithm wouldn't be a better solution (it usually is).

\section{History of standard \texorpdfstring{\CC}{C++} algorithms}

While each \CC\,standard introduced new algorithms or variants, there are few notable milestones in the history of \CC\,standard algorithms.

The \CC98 standard introduced most of the algorithms. However, it was the \CC11 standard with its introduction of lambdas that made algorithms worthwhile. Before lambdas, the time investment of writing a custom function object made the usefulness of algorithms dubious.

\raggedbottom
\begin{codebox}[]{\href{https://godbolt.org/z/73r9n1WYM}{\ExternalLink}}
\footnotesize Example of \cpp{std::for_each}\index{\cpp{std::for_each}} algorithm with a custom function object, calculating the number of elements and their sum.
\tcblower
\cppfile{code_examples/introduction/history_cc98_code.h}
\end{codebox}

\begin{codebox}[]{\href{https://godbolt.org/z/zv4fWbT3z}{\ExternalLink}}
\footnotesize Example of \cpp{std::for_each}\index{\cpp{std::for_each}} algorithm with a capturing lambda, calculating the number of elements and their sum.
\tcblower
\cppfile{code_examples/introduction/history_cc11_code.h}
\end{codebox}

The \CC17 standard introduced parallel algorithms that provide an easy way to speed up processing with minimal effort. All you need to do is to specify the desired execution model, and the library will take care of parallelizing the execution.

\begin{codebox}[]{\href{https://godbolt.org/z/1nK5944K7}{\ExternalLink}}
\footnotesize Example of \cpp{std::for_each}\index{\cpp{std::for_each}} algorithm using unsequenced parallel execution model. Note that counters are now shared state and need to be \cpp{std::atomic}\index{\cpp{std::atomic}} or protected by a \cpp{std::mutex}\index{\cpp{std::mutex}}.
\tcblower
\cppfile{code_examples/introduction/history_cc17_code.h}
\end{codebox}

Finally, the \CC20 standard introduced a significant re-design in the form of ranges and views. Range versions of algorithms can now operate on ranges instead of \cpp{begin} and \cpp{end} iterators and views provide lazily evaluated versions of algorithms and utilities.

\begin{codebox}[]{\href{https://godbolt.org/z/57TGnzzxc}{\ExternalLink}}
\footnotesize Example of the range version of the \cpp{std::for_each}\index{\cpp{std::for_each}} algorithm.
\tcblower
\cppfile{code_examples/introduction/history_cc20_code.h}
\end{codebox}

As of the time of writing, the \CC23 standard is not finalized. However, we already know that it will introduce more range algorithms, more views and the ability to implement custom views.

\section{Iterators and ranges}

Algorithms operate on data structures, which poses an issue. How do you abstract the implementation details of a specific data structure and allow the algorithm to work with any data structure that satisfies the algorithm's requirements?

The \CC\, standard library solution to this problem are iterators and ranges. Iterators encapsulate implementation details of data structure traversal and simultaneously expose a set of operations possible on the given data structure in constant time and space.

A range is then denoted by a pair of iterators, or more generally, since \CC20, an iterator and a sentinel. In mathematical terms, a pair of iterators \cpp{it1}, \cpp{it2} denotes a range \cpp{[it1, it2)}, that is, the range includes the element referenced by \cpp{it1} and ends before the element referenced by \cpp{it2}.

To reference the entire content of a data structure, we can use the \cpp{begin()} and \cpp{end()} methods that return an iterator to the first element and an iterator one past the last element, respectively. Hence, the range \cpp{[begin, end)} contains all data structure elements.

\begin{codebox}[]{\href{https://godbolt.org/z/Wj4YjE51v}{\ExternalLink}}
\footnotesize Example of specifying a range using two iterators.
\tcblower
\cppfile{code_examples/introduction/iterators_code.h}
\end{codebox}

Sentinels follow the same idea. However, they do not need to be of an iterator type. Instead, they only need to be comparable to an iterator. The exclusive end of the range is then the first iterator that compares equal to the sentinel.

\begin{codebox}[]{\href{https://godbolt.org/z/qKrMo7scn}{\ExternalLink}}
\footnotesize Example of specifying a range using an iterator and custom sentinel. The sentinel will compare \cpp{true} with iterators at least the given distance from the start iterator, therefore defining a range with the specified number of elements.
\tcblower
\cppfile{code_examples/introduction/sentinels_code.h}
\end{codebox}

\subsection{Iterator categories}

The set of operations that are possible in constant time and space defines the following categories of iterators (and consequently ranges):

\begin{description}
    \item[input/output iterator] read/write each element once, advance\\
    \textit{data streams, e.g. writing/reading data to/from a network socket}
    \item[forward iterator] read/write each element repeatedly, advance\\
    \textit{singly-linked list, e.g. std::forward\_list}\index{\cpp{std::forward_list}}
    \item[bidirectional iterator] forward iterator + move back\\
    \textit{doubly-linked list, e.g. std::list, std::map, std::set}\index{\cpp{std::list}}\index{\cpp{std::map}}\index{\cpp{std::set}}
    \item[random access iterator] bidirectional iterator + advance and move back by any integer and calculate distance between two iterators\\
    \textit{multi-array data structures, e.g. std::deque}\index{\cpp{std::deque}}
    \item[contiguous iterator] random access iterator + the storage of elements is contiguous\\
    \textit{arrays, e.g. std::vector}\index{\cpp{std::vector}}
\end{description}

\begin{codebox}[]{\href{https://godbolt.org/z/aWT1Wajva}{\ExternalLink}}
\footnotesize Example demonstrating the difference between a random access iterator provided by \cpp{std::vector} and a bidirectional iterator provided by \cpp{std::list}.
\tcblower
\cppfile{code_examples/introduction/categories_code.h}
\end{codebox}

\subsection{Range categories}

Ranges can be classified using the same categories as iterators. In this book, we will be using the range nomenclature over iterators (e.g. input range, forward range, bidirectional range, etc.).

\section{Naming and common behaviour}

While the naming of many algorithms is sub-optimal, there are a few common naming patterns.

\subsection{Counted variants "\texttt{\_n}"}

Counted variants of algorithms accept the range specified using the start iterator and the number of elements (instead of begin and end). This behaviour can be a convenient alternative when working with input and output ranges, where we often do not have an explicit end iterator.\\

\noindent Examples: \cpp{std::for_each_n}, \cpp{std::copy_n}\\\index{\cpp{std::for_each_n}}\index{\cpp{std::copy_n}}

\noindent Note: while \cpp{std::search_n} does follow the naming, it does not follow the same semantics. The \texttt{\_n} here refers to the number of instances of the searched element.\index{\cpp{std::search_n}}

\subsection{Copy variants "\texttt{\_copy}"}

Copy variants of in-place algorithms do not write their output back to the source range. Instead, they output the result to one or more output ranges, usually defined by a single iterator denoting the first element to be written to (the number of elements is implied from the source range). The copy behaviour allows these variants to operate on immutable ranges.\\

\noindent Examples: \cpp{std::remove_copy}, \cpp{std::partial_sort_copy}\index{\cpp{std::remove_copy}}\index{\cpp{std::partial_sort_copy}}

\subsection{Predicate variants "\texttt{\_if}"}

Predicate variants of algorithms use a predicate to determine a "match" instead of comparing against a value. The standard also has one instance of \texttt{\_if\_not} variant that inverts the predicate logic (\cpp{false} is treated as a match).\\

\noindent Examples: \cpp{std::find_if}, \cpp{std::replace_if}\index{\cpp{std::find_if}}\index{\cpp{std::replace_if}}

\subsection{Restrictions on invocable}

Many algorithms can be customized using an invocable. However, with a few exceptions, the invocable is not permitted to modify elements of the range or invalidate iterators. On top of that, unless explicitly noted, the algorithms do not guarantee any particular order of invocation.

These restrictions in practice mean that the passed invocable must be regular. The invocable must return the same result if invoked again with the same arguments. This definition permits accessing a global state such as a cache but does not permit invocables that change their result based on their internal state (such as generators).

\section{A simpler mental model for iterators}

It can be tricky to internalize all the rules associated with iterators when working with standard algorithms. One shorthand that can help is to think in terms of ranges instead of iterators.

Ranges passed in as arguments are usually apparent, typically specified by pair of iterators.

\begin{codebox}[]{\href{https://godbolt.org/z/zTh9rn5E4}{\ExternalLink}}
\footnotesize Example with two ranges passed in as an argument. The input range is fully specified, and the end iterator for the output range is implied from the number of elements in the input range.
\tcblower
\cppfile{code_examples/introduction/mental_range_code.h}
\end{codebox}

The returned range can also be evident from the semantics of the algorithm.

\begin{codebox}[]{\href{https://godbolt.org/z/escqfWsnE}{\ExternalLink}}
\footnotesize Example of \cpp{std::is_sorted_until} that returns an iterator to the first out-of-order element, which can also be thought as the end iterator for a maximal sorted sub-range.
\tcblower
\cppfile{code_examples/introduction/mental_sorted_code.h}
\end{codebox}

The benefit of thinking about the returned value as the end iterator of a range is that it removes the potential for corner cases. For example, what if the algorithm doesn't find any element out of order? The returned value will be the end iterator of the source range, meaning that the range returned is simply the entire source range.

\newpage

In some cases, a single returned iterator denotes multiple ranges.

\begin{codebox}[]{\href{https://godbolt.org/z/YEGKPToz7}{\ExternalLink}}
\footnotesize Example of \cpp{std::lower_bound} that splits the range into two sub-ranges.
\tcblower
\cppfile{code_examples/introduction/mental_two_code.h}
\end{codebox}

Even when the algorithm returns an iterator to a specific element, it might be worth considering the implied range.

\begin{codebox}[]{\href{https://godbolt.org/z/Er15W6K5W}{\ExternalLink}}
\footnotesize Example of \cpp{std::find} establishing a prefix range that doesn't contain the searched-for element.
\tcblower
\cppfile{code_examples/introduction/mental_find_code.h}
\end{codebox}